Character Segmentation Scheme for OCR System : For Myanmar Printed

نویسندگان

  • Phyo Thu Thu Khine
  • Khin Nwe Ni Tun
چکیده

Automatic machine-printed Optical Characters or texts Recognizers (OCR) are highly desirable for a multitude of modern IT applications, including Digital Library software. However, the state of the art OCR systems cannot do for Myanmar scripts as the language poses many challenges for document understanding. Therefore, the authors design an Optical Character Recognition System for Myanmar Printed Document (OCRMPD), with several proposed techniques that can automatically recognize Myanmar printed text from document images. In order to get more accurate system, the authors propose the method for isolation of the character image by using not only the projection methods but also structural analysis for wrongly segmented characters. To reveal the effectiveness of the segmentation technique, the authors follow a new hybrid feature extraction method and choose the SVM classifier for recognition of the character image. The proposed algorithms have been tested on a variety of Myanmar printed documents and the results of the experiments indicate that the methods can increase the segmentation accuracy as well as recognition rates. DOI: 10.4018/978-1-4666-3906-5.ch018

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Character Segmentation Scheme for OCR System: For Myanmar Printed Documents

Automatic machine-printed Optical Characters or texts Recognizers (OCR) are highly desirable for a multitude of modern IT applications, including Digital Library software. However, the state of the art OCR systems cannot do for Myanmar scripts as the language poses many challenges for document understanding. Therefore, the authors design an Optical Character Recognition System for Myanmar Print...

متن کامل

Bilingual OCR System for Myanmar and English Scripts with Simultaneous Recognition

Htwe Pa Pa Win, Phyo Thu Thu Khine, Khin Nwe Ni Tun AbstractThe increasing amount of development of the digital libraries worldwide raises many new challenges for document image analysis research and development. Storing wide variety of document images in Digital library, for example, for cultural, technical or historical, that are written in many languages, also create many advancement for pre...

متن کامل

OCR for printed Kannada text to Machine editable format using Database approach

This paper describes an Optical Character Recognition (OCR) system for printed text documents in Kannada, a South Indian language. The proposed OCR system for the recognition of printed Kannada text, which can handle all types of Kannada characters. The system first extracts image of Kannada scripts, then from the image to line segmentation then segments the words into sub-character level piece...

متن کامل

A Structural Analysis Based Feature Extraction Method for OCR System For Myanmar Printed Document Images

This paper proposes a new feature extraction method for off-line recognition of Myanmar printed documents. One of the most important factors to achieve high recognition performance in Optical Character Recognition (OCR) system is the selection of the feature extraction methods. Different types of existing OCR systems used various feature extraction methods because of the diversity of the script...

متن کامل

A Modified Character Segmentation Algorithm for Farsi Printed Text Using Upper Contour Labelling

In this paper, a modified segmentation algorithm for printed Farsi words is presented. This algorithm is based on a previous work by Azmi that uses the conditional labeling of the upper contour to find the segmentation points. The main objective is to improve the segmentation results for low quality prints. To achieve this, various modifications on local baseline detection, contour labeling an...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016